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DETAILED ACTION 

1. In response to the communication dated 07/27/2006, claims 46-67 are pending in 
this application as the result of the addition of claims 55-67 

2. This application is a continuation of 09/768947 now patent numbered 6,658,423. 

Terminal Disclaimer 

3. The terminal disclaimer filed on 07/27/2006 disclaiming the terminal portion of 
any patent granted on this application which would extend beyond the expiration date has 
been reviewed and is accepted. The terminal disclaimer has been recorded. 

The Double Patenting Rejection is hereby withdrawn. 

Claim Objection 

4. Claim 51 is objected to because of the following informalities: claim depends on 
itself. Appropriate correction is required. 

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

5. Claims 46-67 are rejected under 35 U.S.C. 101 because the claimed invention is 
directed to non-statutory subject matter. 

As set forth in MPEP 21 06(II)A: 
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Identify and understand Any Practical Application Asserted for the Invention The 
claimed invention as a whole must accomplish a practical application. That is, it must 
produce a "useful, concrete and tangible result. 11 State Street, 149 F.3d at 1373, 
47USPQ2d at 1601-02. The purpose of this requirement is to limit patent protection to 
inventions that possess a certain level of "real world" value, as opposed to subject matter 
that represents nothing more than an idea or concept, or is simply a starting point for 
future investigation or research (Brenner v. Manson, 383 U.S. 519, 528-36, 148 USPQ 
689, 693-96)/ In re Ziegler, 992, F.2d 1 197, 1200-03, 26 USPQ2d 1600, 1603-06 (Fed. 
Cir. 199334. Accordingly, a complete disclosure should contain some indication of the 
practical application for the claimed invention, i.e., why the applicant believes the 
claimed invention is useful. 

Apart from the utility requirement of 35 U.S.C. 101, usefulness under the patent 
eligibility standard requires significant functionality to be present to satisfy the useful 
result aspect of the practical application requirement. See Arrhythmia, 958 F.2d at 1057, 
22 USPQ2d at 1036. Merely claiming nonfunctional descriptive material stored in a 
computer-readable medium does not make the invention eligible for patenting. For 
example, a claim directed to a word processing file stored on a disk may satisfy the utility 
requirement of 35 U.S.C. 101 since the information stored may have some "real world" 
value. However, the mere fact that the claim may satisfy the utility requirement of 35 
U.S.C. 101 does not mean that a useful result is achieved under the practical application 
requirement. The claimed invention as a whole must produce a "useful, concrete and 
tangible" result to have a practical application. 
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The claimed invention is subject to the test of State Street, 149 F.3d at 1373-74, 
47 USPQ2d at 1601-02. Specifically State Street sets forth that the claimed invention 
must produce a "useful, concrete and tangible result". The Interim Guidelines for 
Examination of Patent Applications for Patent Subject Maher Eligibility states in section 
IV C. 2 b. (2) (on page 21 in the PDF format): 

The tangible requirement does not necessarily mean that a claim must either be tied to a particular 
machine or apparatus or must operate to change articles or materials to a different state or thing. 
However, the tangible requirement does require that the claim must recite more than a §101 
judicial exception, in that the process claim must set forth a practical application of that §101 
judicial exception to produce a real-world result. Benson, 409 U.S. at 71-72, 175 USPQ at 676-77 
(invention ineligible because had "no substantial practical application"). 

Claimed invention (Claim 46) recites a method for filtering search results to 
remove near-duplicates comprising determining whether the candidate search result is a 
near duplicate of another candidate search result by comparing cluster identifiers which 
do not provide useful and tangible results. In order for the claim to be tangible, it must 
have real world value rather than being an abstract result. This claim contains software 
per se which is not tangible. Moreover, the claim lack of practical application as to how 
the system would operate if it is determined that the candidate search result is not a near- 
duplicate of the other candidate search result. 

Claim 47 recites similar subject matter as set forth above claim 46, thus being 
rejected on the same ground. 
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Claimed invention (Claims 48 and 67) recites a machine-readable medium having 
stored thereon a plurality of records which do not satisfy the useful result aspect of the 
practical application requirement. Merely claiming nonfunctional descriptive material 
stored in a machine-readable medium does not make the invention eligible for patenting. 
The claims recite functional descriptive hash function, which is used to hash each of the 
elements to determine which one of the plurality of list that each of the elements will be 
contained in. However, the determination is performed outside of the machine-readable 
medium and is nothing to do with the functional aspect of the medium. Thus, merely 
reciting non-functional descriptive material (field and lists) by putting them into memory 
does not lead to a practical application. 

Claimed invention (Claim 49) recites a method for determining whether two 
documents are near-duplicates comprising for each of the two documents, generating at 
least two fingerprints and determining whether or not the two documents are near- 
duplicates document which do not provide useful and tangible results. In order for the 
claim to be tangible, it must have real world value rather than being an abstract result. 
This claim contains software per se which is not tangible. Moreover, the claim lack of 
practical application as to how the system would operate if it is determined that the 
fingerprint of the first of the two documents does not match with the fingerprint of the 
second of the two documents. 

Claimed invention (Claims 50-66) recites a machine-readable medium having 
stored thereon a plurality of records which do not satisfy the useful result aspect of the 
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practical application requirement. Merely claiming nonfunctional descriptive material 
stored in a machine-readable medium does not make the invention eligible for patenting. 
Merely reciting non- functional descriptive material such as field and lists by putting them 
into memory does not lead to a practical application. 

Claim Rejections - 35 USC § 112 
The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

6. Claims 46-47, 49 and 52, are rejected under 35 U.S.C. 112, second paragraph, as 
being indefinite for failing to particularly point out and distinctly claim the subject matter 
which applicant regards as the invention. 

Regarding claims 46, the claim recites "that" at line 9 which renders the claim 
vague and indefinite because it's unclear what Applicant meant by "that". 

Regarding claim 47, there is insufficient antecedent basis for "the candidate 
search result" at lines 6-7 and "the two candidate search results" at line 13 and "the other 
candidate search result" at line 18. Moreover, the claim recites "that" at line 11, which 
renders the claim vague and indefinite because it's unclear what Applicant meant by 
"that". 

Regarding claims 49 and 52, the claims recite negative limitation "not". 

Due to the vagueness and a lack of clear definition of the terminology and phrases 
used in the specification and claims, the claims have been treated on their merits as best 
understood by the examiner. 
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Claim Rejections - 35 USC § 102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 
A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent granted 
on an application for patent by another filed in the United States before the invention by the applicant 
for patent, except that an international application filed under the treaty defined in section 351(a) shall 
have the effects for purposes of this subsection of an application filed in the United States only if the 
international application designated the United States and was published under Article 21(2) of such 
treaty in the English language. 

7. Claim 49 is rejected under 35 U.S.C. 102(e) as being anticipated by Broder (US 
6,119,124). 

Regarding claim 49, Broder discloses a method for determining whether two 
documents are near-duplicates (See col. 4, line 6 et seq.), the method comprising: 

a) for each of the two documents, generating at least two fingerprints (See col. 4, 
lines 19-24, wherein unique identifications of a document can be computed as digital 
fingerprints corresponding to at least two fingerprints for each document); and 

b) determining whether or not the two documents are near-duplicate documents 

by 

i) determining whether or not any one of the at least two fingerprints of a 
first of the two documents matches any one of the at least two fingerprints 
of a second of the two documents (See col. 10, lines 27-29), and 

ii) if it is determined that anyone fingerprint of the at least two fingerprints 
of the first of the two documents does match any one fingerprint of the at 
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least two fingerprints of the second of the two documents, then concluding 
that the two documents are near-duplicates (See 10, lines 27-29). 

8. Claims 50-54, 58-60, 61-63, and 64-66 are rejected under 35 U.S.C. 102(b) as 
being anticipated by Johnson (U.S. Patent No. 5,850,490). 

Regarding claims 50-54, Johnson discloses a machine-readable medium having 
stored thereon a plurality of records (See Fig. 16), each of the records comprising: 

a) a first field for storing a document identifier (Document Identifier field 
0001, Fig. 16); and 

b) a plurality of lists (Segment class 1, segment class 2, segment class 3. . ., 
Fig. 16), each of the plurality of lists containing elements of a document 
identified by the document identifier stored in the first field ("Cowherds of 
the Deep" for example, Fig. 16, wherein, words in "Cowherds of the 
Deep" corresponding to elements of a document), 

Johnson teaches a plurality of records organized into a table; each record reflects 
a document at Fig. 16, entry 452, wherein each record having plurality of segment classes 
(lists). Since a document may not have the same keywords with other documents, 
therefore, some of the Segment classes 1, 2, 3 include no keyword (element). Thus, 
Johnson teaches wherein at least some of the plurality of lists include different numbers 
of elements and wherein at least one of the plurality of lists include no elements as per 
claims 50 and 51. 

Johnson teaches wherein contiguous elements in a document are not necessarily 
contiguous elements of a list as one having ordinary skill in the art would have 
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recognized that table in Fig. 16 stores plurality of Segment classes for keywords in 
documents thus keywords in document are not necessarily contiguous in key fields as per 
claim 52. 

Johnson teaches wherein for each of the records, the number of lists is the same as 
each of records has the same number of segment classes as per claim 53. Since each 
document may have the same keywords with other documents or may not have the same 
keywords with other documents, however, document records still have the same number 
of segment classes (list) range from 1 to ... in the table; therefore, Johnson teaches 
wherein a number of the plurality of lists is independent of document size as per claim 
54. 

Regarding claims 58, 61 and 64, Johnson teach discloses wherein each of the 
elements of a document is an element that has been extracted from the document (See 
col. 20, lines 20-31). 

Regarding claims 59, 62 and 65, Johnson discloses wherein each of the elements 
of a document is a predetermined one of (A) a predetermined number of words (See col. 
20, lines 28-31, for example, "It was a dark"), (B) a predetermined number of sentences, 
(C) a predetermined number of characters, (D) a predetermined number of paragraphs, 
and (e) a predetermined number of sections. 
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Regarding claims 60, 63 and 66, Johnson discloses wherein each of the elements 
of a document partially overlaps another of the elements of the document (See col. 20, 
lines 28-31) 

Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

9. Claim 48 is rejected under 35 U.S.C. 103(a) as being unpatentable over Johnson 
(U.S. Patent No. 5,850,490), in view of Fujiwara (U.S. Patent No. 6,381,601). 

Regarding claim 48, Johnson discloses a machine-readable medium having stored 
thereon a plurality of records (See Fig. 16), each of the records comprising: 

a) a first field for storing a document identifier (Document Identifier field 
0001, Fig. 16); and 

b) a plurality of lists (Segment class 1, segment class 2, segment class 3. . ., 
Fig. 16), each of the plurality of lists containing elements of a document 
identified by the document identifier stored in the first field ("Cowherds of 
the Deep" for example, Fig. 16, wherein, words in "Cowherds of the 
Deep" corresponding to elements of a document), 

However, Johnson is silent as to teach wherein a hash function is used to hash each of the 
elements in order to determine which of the plurality of lists that each of the elements 
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will be contained in. On the other hand, Fujiwara teaches using a hash function to hash 
each of the elements in order to determine which of the plurality of lists that each of the 
element will be contained in (See col. 2, lines 57-62, col. 4, lines 47-62, col. 5, line 50 to 
col. 6, line 10, Fujiwara et al It would have been obvious to one having ordinary skill in 
the art at the time of the invention was made to use a hash function to hash each of the 
elements in order to determine which of the plurality of lists that each of the element will 
be contained in. The motivation would have been to reduce or remove duplicate 
elements by using hash function. 

10. Claims 48, 55-57, and 67 are rejected under 35 U.S.C 103(a) as being 
unpatentable over Judd (U.S. Patent No. 6,360,215), in view of Fujiwara (U.S. Patent No. 
6,381,601). 

Regarding claim 48, Judd discloses a machine-readable medium having stored 
thereon a plurality of records (Index 16, Fig. 1 and col. 6, lines 47-48 and lines 66-67), 
each of the records comprising: 

a) a first field for storing a document identifier (col. 7, lines 45-46, 
wherein location identifier of the current document corresponding to "document 
identifier" and wherein document index record corresponding to "record"); and 

b) a plurality of lists, each of the plurality of lists containing elements of a 
document identified by the document identifier stored in the first field 
(See col. 7, lines 45-50, wherein columns (lists) of record contain values 
of properties (elements of document) such as document title, document 
summary. One having ordinary skill in the art would have recognized that 
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document title or document summary would contain words, characters or 
sentences), 

Judd teaches MD5 hash function (See col. 7, line 65 to col. 8, line 9). However, Judd is 
silent as to teach wherein a hash function is used to hash each of the elements in order to 
determine which of the plurality of lists that each of the elements will be contained in. 
On the other hand, Fujiwara teaches using a hash function to hash each of the elements in 
order to determine which of the plurality of lists that each of the element will be 
contained in (See col. 2, lines 57-62, col. 4, lines 47-62, col. 5, line 50 to col. 6, line 10, 
Fujiwara et al. It would have been obvious to one having ordinary skill in the art at the 
time of the invention was made to use a hash function to hash each of the elements in 
order to determine which of the plurality of lists that each of the element will be 
contained in. The motivation would have been to reduce or remove duplicate elements 
by using hash function. 

Regarding claim 55, Judd/Fujiwara discloses wherein each of the elements of a 
document is an element that has been extracted from the document (See col. 7, lines 48- 
50, Judd et al.). 

Regarding claim 56, Judd/Fujiwara discloses wherein each of the elements of a 
document is a predetermined one of (A) a predetermined number of words (See col. 7, 
lines 48-50, "document title", "document summary", Judd et al.), (B) a predetermined 
number of sentences (See col. 7, lines 48-50, "document summary", Judd et al.), (C) a 
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predetermined number of characters, (D) a predetermined number of paragraphs, and (e) 
a predetermined number of sections. 

Regarding claim 57, Judd/Fujiwara discloses wherein each of the elements of a 
document partially overlaps another of the elements of the document (See co. 7, lines 47- 
50, Judd et al.). 

Regarding claim 67, this claim recites similar subject matter as set forth above in 
claim 48, thus is rejected under similar ground. 

11. Claims 50-54, 58-60, 61-63, and 64-66 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Bates (U.S. Patent No. 6,873,982), in view of Johnson (U.S. 
Patent No. 5,850,490). 

Regarding claims 50-54, Bates discloses a machine-readable medium having 
stored thereon a plurality of records (See Fig. 4), each of the records comprising: 

a) a first field for storing a document identifier (Document Identifier field 
102, Fig. 4); and 

b) a plurality of lists (Key 1 . . ..Key N, reference 106, Fig. 4), each of the 
plurality of lists containing element of a document identified by the 
document identifier stored in the first field (See col. 9, lines 1-3), 

Bates teaches each of the plurality of lists containing one element of a document. 
However, Bates is silent as to each of the plurality of lists containing elements of a 
document. On the other hand, Johnson teaches each of the plurality of lists containing 
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elements of a document (See Fig. 16, and corresponding text, Johnson). It would have 
been obvious to one having ordinary skill in the art at the time of the invention was made 
to have each of the plurality of lists containing more than one element of a document as 
suggested by Johnson because the difference are only found in the nonfunctional 
descriptive material and do not alter how the elements of system function. Thus, this 
descriptive material will not distinguish the claimed invention from the prior art in terms 
of patentability, see In re Gulack, 703 F.2d 1381, 217 USPQ 401, 404 (Fed. Cir. 1983); 
In re Lowry, 32 F.3d 1579, 32 USPQ2d 1031 (Fed. Cir. 1994). 

Bates teaches a plurality of records organized into a table, each record reflects a 
document at Fig. 4 and col. 6, lines 33-45, wherein each record having plurality of 
keyword fields (lists). Since a document may not have the same keywords with other 
documents, therefore, some of the key fields 106 include no keyword (element). Thus, 
Bates teaches wherein at least some of the plurality of lists include different numbers of 
elements and wherein at least one of the plurality of lists include no elements as per 
claims 50 and 51. 

Bates teaches wherein contiguous elements in a document are not necessarily 
contiguous elements of a list as one having ordinary skill in the art would have 
recognized that table in Fig. 4 stores plurality of key fields for keywords in documents 
thus keywords in document are not necessarily contiguous in key fields as per claim 52. 

Bates teaches wherein for each of the records, the number of lists is the same as 
each of records has the same number of key 1 to key N fields as per claim 53. Since each 
document may have the same keywords with other documents or may not have the same 
keywords with other documents, however, document records still have the same number 
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of key fields (list) range from 1 to N in the table; therefore, Bate teaches wherein a 
number of the plurality of lists is independent of document size as per claim 54. 

Regarding claims 58, 61 and 64, Bates/Johnson discloses wherein each of the 
elements of a document is an element that has been extracted from the document (See 
col. 9, lines 2-3, Bates et al.) 

Regarding claims 59, 62 and 65, Bates/Johnson discloses wherein each of the 
elements of a document is a predetermined one of (A) a predetermined number of words, 
(B) a predetermined number of sentences (C) a predetermined number of characters 
(word contain predetermined character, col. 9, lines 2-3), (D) a predetermined number of 
paragraphs, and (e) a predetermined number of sections. 

Regarding claims 60, 63 and 66, Bates/Johnson discloses wherein each of the 
elements of a document partially overlaps another of the elements of the document (See 
Fig. 4). 

12. Claims 48, 55-56 and 67 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Bates (U.S. Patent No. 6,873,982), in view of Johnson (U.S. Patent No. 
5,850,490), and further in view of Fujiwara (U.S. Patent No. 6,381,601). 

Regarding claim 48, Bates discloses a machine-readable medium having stored 
thereon a plurality of records (See Fig. 4), each of the records comprising: 
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a) a first field for storing a document identifier (Document Identifier field 
102, Fig. 4); and 

b) a plurality of lists (Key 1 . . ..Key N, reference 106, Fig. 4), each of the 
plurality of lists containing element of a document identified by the 
document identifier stored in the first field (See col. 9, lines 1-3). 

Bates teaches each of the plurality of lists containing one element of a document. 
However, Bates is silent as to each of the plurality of lists containing elements of a 
document. On the other hand, Johnson teaches each of the plurality of lists containing 
elements of a document (See Fig. 16, and corresponding text, Johnson). It would have 
been obvious to one having ordinary skill in the art at the time of the invention was made 
to have each of the plurality of lists containing more than one element of a document as 
suggested by Johnson because the difference are only found in the nonfunctional 
descriptive material and do not alter how the elements of system function. Thus, this 
descriptive material will not distinguish the claimed invention from the prior art in terms 
of patentability, see In re Gulack, 703 F.2d 1381, 217 USPQ 401, 404 (Fed. Cir. 1983); 
In reLowry, 32 F.3d 1579, 32 USPQ2d 1031 (Fed. Cir. 1994). 

Bates is silent as to teach wherein a hash function is used to hash each of the elements in 
order to determine which of the plurality of lists that each of the elements will be 
contained in. On the other hand, Fujiwara teaches using a hash function to hash each of 
the elements in order to determine which of the plurality of lists that each of the element 
will be contained in (See col. 2, lines 57-62, col. 4, lines 47-62, col. 5, line 50 to col. 6, 
line 10, Fujiwara et al. It would have been obvious to one having ordinary skill in the art 
at the time of the invention was made to use a hash function to hash each of the elements 
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in order to determine which of the plurality of lists that each of the element will be 
contained in. The motivation would have been to reduce or remove duplicate elements 
by using hash function. 

Regarding claim 55, Bates/Johnson/Fujiwara discloses wherein each of the 
elements of a document is an element that has been extracted from the document (See 
col. 9, lines 2-3, Bates et al.) 

Regarding claim 56, Bates/Johnson/Fujiwara discloses wherein each of the 
elements of a document is a predetermined one of (A) a predetermined number of words, 
(B) a predetermined number of sentences (C) a predetermined number of characters 
(word contain predetermined character, col. 9, lines 2-3), (D) a predetermined number of 
paragraphs, and (e) a predetermined number of sections. 

Regarding claim 67, this claim recites similar subject matter as set forth above in 
claim 48, thus is rejected under similar ground. 

Response to Arguments 
13. Applicant's arguments with respect to claims 46-67 have been considered but are 
moot in view of the new ground(s) of rejection. 
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Conclusion 



14. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Crow U.S Patent No. 6,665,661 discloses system and method for use in text 
analysis of documents and records. 

15. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Merilyn P Nguyen whose telephone number is 571-272- 
4026. The examiner can normally be reached on M-F: 8:30 - 5:00. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Don Wong can be reached on 571-272-1834. The fax phone numbers for the 
organization where this application or proceeding is assigned are 571-273-8300 for 
regular communications and 703-746-7240 for After Final communications. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 703-305- 



3900. 
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